67 research outputs found

    Addressing Function Approximation Error in Actor-Critic Methods

    Get PDF
    In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.Comment: Accepted at ICML 201

    Cost Adaptation for Robust Decentralized Swarm Behaviour

    Full text link
    Decentralized receding horizon control (D-RHC) provides a mechanism for coordination in multi-agent settings without a centralized command center. However, combining a set of different goals, costs, and constraints to form an efficient optimization objective for D-RHC can be difficult. To allay this problem, we use a meta-learning process -- cost adaptation -- which generates the optimization objective for D-RHC to solve based on a set of human-generated priors (cost and constraint functions) and an auxiliary heuristic. We use this adaptive D-RHC method for control of mesh-networked swarm agents. This formulation allows a wide range of tasks to be encoded and can account for network delays, heterogeneous capabilities, and increasingly large swarms through the adaptation mechanism. We leverage the Unity3D game engine to build a simulator capable of introducing artificial networking failures and delays in the swarm. Using the simulator we validate our method on an example coordinated exploration task. We demonstrate that cost adaptation allows for more efficient and safer task completion under varying environment conditions and increasingly large swarm sizes. We release our simulator and code to the community for future work.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 201

    Bayesian Policy Gradients via Alpha Divergence Dropout Inference

    Full text link
    Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult. We propose an approach which instead estimates a distribution by fitting the value function with a Bayesian Neural Network. We optimize an α\alpha-divergence objective with Bayesian dropout approximation to learn and estimate this distribution. We show that using the Monte Carlo posterior mean of the Bayesian value function distribution, rather than a deterministic network, improves stability and performance of policy gradient methods in continuous control MuJoCo simulations.Comment: Accepted to Bayesian Deep Learning Workshop at NIPS 201

    Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty Modeling

    Full text link
    In this work, we demonstrate how to reliably estimate epistemic uncertainty while maintaining the flexibility needed to capture complicated aleatoric distributions. To this end, we propose an ensemble of Normalizing Flows (NF), which are state-of-the-art in modeling aleatoric uncertainty. The ensembles are created via sets of fixed dropout masks, making them less expensive than creating separate NF models. We demonstrate how to leverage the unique structure of NFs, base distributions, to estimate aleatoric uncertainty without relying on samples, provide a comprehensive set of baselines, and derive unbiased estimates for differential entropy. The methods were applied to a variety of experiments, commonly used to benchmark aleatoric and epistemic uncertainty estimation: 1D sinusoidal data, 2D windy grid-world (WetChicken\it{Wet Chicken}), Pendulum\it{Pendulum}, and Hopper\it{Hopper}. In these experiments, we setup an active learning framework and evaluate each model's capability at measuring aleatoric and epistemic uncertainty. The results show the advantages of using NF ensembles in capturing complicated aleatoric while maintaining accurate epistemic uncertainty estimates

    Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning

    Full text link
    In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the environment dynamics and produce imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.Comment: 14 page
    • …
    corecore